227 research outputs found
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
Authorship Identification in Bengali Literature: a Comparative Analysis
Stylometry is the study of the unique linguistic styles and writing behaviors
of individuals. It belongs to the core task of text categorization like
authorship identification, plagiarism detection etc. Though reasonable number
of studies have been conducted in English language, no major work has been done
so far in Bengali. In this work, We will present a demonstration of authorship
identification of the documents written in Bengali. We adopt a set of
fine-grained stylistic features for the analysis of the text and use them to
develop two different models: statistical similarity model consisting of three
measures and their combination, and machine learning model with Decision Tree,
Neural Network and SVM. Experimental results show that SVM outperforms other
state-of-the-art methods after 10-fold cross validations. We also validate the
relative importance of each stylistic feature to show that some of them remain
consistently significant in every model used in this experiment.Comment: 9 pages, 5 tables, 4 picture
Multi-scale assessment of drought-induced forest dieback
Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesDrought has been intensified over the years and will continue to worsen due to climate change. Existing works have focused their attention on crops rather than forests. Adverse effects are felt by all flora and fauna but the impact of the recent droughts on forest ecosystems is still unknown. Greater root depth allows them to withstand the immediate impacts of drought in contrast to crops and other vegetation. This study aims to see not only the interaction between drought and forest vitality from a multi-scale and temporal viewpoint while also to detect the impact of the recent 2018/19 drought on forest vitality based on remote sensing data. The data from the German Drought Monitor was used for the area-wide estimation of drought in Germany. Vegetative indices like NDVI collected from MODIS and Sentinel 2A were used to study the interactions between drought and forest vitality. Data for both have been acquired for the years 2000-2019. A long-standing time series data was decomposed and seasonally adjusted for better cross-correlation between the variables. The cross-correlation was verified by using breakpoints estimation by dividing the data into historically observed and test data. The coniferous-dominated black forest was used as a study area for a more in-depth analysis.
Results showed that forest vitality was lowest one month after a severe drought, indicated by the highest decline in NDVI for all the forest types. This was verified using high resolution Sentinel images and the highest change does correspond to the month of January 2019. There was change in NDVI of over -0.5 for 80.63% of the entire study area. The change for each forest type was 81.74%, 54.42%, 84.14% for coniferous, broadleaved and mixed forests respectively. Two decades of NDVI and Soil Moisture Index (SMI) data along with Sentinel images for better area calculation because of higher resolution make this a highly effective approach to assess the impacts of drought on forest dieback. The methodology and data can be applied across the study area and with suitable drought indices can be used to assess the drought induced forest dieback across the globe. However, in-situ analysis with ecological considerations at the individual level could further the validity of the cross-correlations among forest types and drought.
Reproducibility self-assessment (https://osf.io/j97zp/): 3, 2, 3, 1, 3 (input data, pre-processing, methods, computational environment, results)
- …